Automatic assessment of voice quality using fundamental harmonic normalised spectra and Gaussian mixtures
نویسندگان
چکیده
Classification of speech data from male volunteers (normal) and patients recovering from cancer of the larynx (abnormal) is discussed. Analysis of normals and abnormals has shown that there is a significant distinction in the fundamental frequency and harmonic envelope between these groups during constant phonation of vowel sounds. This work proposes a method of deriving the Fundamental-Harmonic Normalised (FHN) spectrum from the speech data and fitting a mixture of Gaussians to model the distribution of power within the FHN spectrum. The aim of this work is to provide a set of features for subsequent classification using an Artificial Neural Network (ANN). Introduction An increasingly important factor in prescribing treatment for cancer of the larynx is the quality of voice retained post-therapy. Current techniques for analysis of voice quality following treatment of cancer of the larynx are slow, mainly subjective, and based on limited numbers of retrospective studies. A method of accurately measuring voice quality in cancer patients with respect to a standard normal voice quality is required to enable speech and language therapists (SALTS) to provide an objective assessment during clinical evaluation. Earlier work has shown that a Multi-Layer Perceptron (MLP) trained using raw power spectral data can accurately classify speech signals as normal or abnormal [1]. The aim of this research is to derive an improved set of features to better model the frequency/power distribution whilst also reducing the dimensionality of the data. A further constraint is that the feature set must provide equal or better classification accuracy of voice quality for patients recovering from cancer of the larynx. Data Capture Data was captured under clinical conditions at the Christie and Withington Hospitals in Manchester. The tool used to capture speech sequences was the Electrolaryngograph PCLX system [2]. This system is used to capture electrical impedance signals using impedance pads placed either side of the neck synchronously with acoustic signals using a microphone. The Electrolaryngograph provides four-channel 16-bit analogue-to-digital conversion and two-channel 16bit digital-to-analogue conversion. A TI TMS320C25 50MHz DSP chip carries out digital signal processing functions, e.g. sampling, filtering, quantisation. Impedance data channels were captured synchronously at 20kHz for up to 3 seconds while the subject phonated the vowel /if as steadily as possible. Feature Extraction Determining the most suitable features for analysis of voice quality is a non-trivial process and many have been proposed [3,4,5]. Features such as jitter, shimmer, normalised-noise energy and other measures have been used in an attempt to provide an overall evaluation of vocal function. Following discussions with a SALT it was concluded that their expert knowledge was related to subtle variations in the frequency structure in a patient's stylised speech. It has already been shown that the frequency structure of speech recordings can be used to classify speech quality [6]. A method of separating the spectral envelope (containing the harmonic and formant frequencies) from the distribution of the fundamental frequency and its harmonics has now been developed. First, the impedance time series is transformed to stationarity prior to further processing. The autocovariance of a 1000 point frame is multiplied with MAVEBA 1999, Firenze, Italy 128 Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 1999) Firenze, Italy, September 1-3, 1999 ISCAArchive http://www.isca-speech.org/archive a Hanning window to suppress variance at increasing lag and then transformed into the frequency domain using the Fast Fourier Transform (FFT). The Power Spectral Density (PSD) is normalised relative to the fundamental frequency and its harmonics to produce the Fundamental Harmonic Normalised (FHN) spectrum. An example of the FHN spectrum is shown in figure 1.
منابع مشابه
Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملThe Study of Vocal Function in Patients With Early Laryngeal Carcinoma After Transoral Laser Microsurgery
Objective Today transoral laser microsurgery is considered as one of the first options to control early laryngeal cancer, and voice disorder is one of the inevitable complications of this therapeutic component. This study aimed to compare the vocal function in patients with early-stage laryngeal cancer following laser surgery with healthy individuals with normal voice quality using acoustic ana...
متن کاملImmediate effects of vocal warm-up exercises on elementary teachers' voice
Introduction: Teachers are a large group of professional voice users who are exposed to many voice problems. Vocal warm-up exercises (VWUE) can prepare the muscles involved in vocalization before teaching and can reduce voice damage in teachers. However, limited studies have examined the effects of VWUE on teachers' voices. Therefore, the present study was conducted to investigate the immediate...
متن کاملEffect of Functional Endoscopic Sinus Surgery on the Voice Quality among Patients with Rhinosinus Polyposis
Introduction: Rhinosinus polyposis is associated with voice quality reduction. There has been little evidence about the efficacy of rhinosinus polyps surgery on patients' voice quality so far. The aim of the present study was to evaluate the nasality and acoustic voice changes after rhinosinus polyposis surgery. Materials and Methods: The population in this study compo...
متن کاملAcoustic Voice Measures in Benign Mass Lesions
Objectives: The present study aims to compare acoustic voice parameters in patients with vocal cord nodules, polyps, and normal subjects. Methods: In this cross-sectional case-control study, the participants were selected by convenience sampling, including 30 patients with vocal polyps group, 38 patients with vocal nodules for the second group, and 42 participants without voice pathologies a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999